-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pushdown params to disable time range constraint pushdown #1216
Use pushdown params to disable time range constraint pushdown #1216
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
10a9d01
to
bfb75ba
Compare
26dd729
to
9435008
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good! Left some in-line suggestions to improve readability, but feel free to address those at the top of the stack.
@@ -238,21 +242,30 @@ def _build_aggregated_conversion_node( | |||
constant_properties: Optional[Sequence[ConstantPropertyInput]] = None, | |||
) -> BaseOutput: | |||
"""Builds a node that contains aggregated values of conversions and opportunities.""" | |||
# Pushdown parameters vary with conversion metrics due to the way the time joins are applied. | |||
# Due to other outstanding issues with conversion metric fitlers, we disable predicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: fitlers
time_range_constraint: Optional[TimeRangeConstraint] | ||
_PREDICATE_METADATA_KEY = "is_predicate_property" | ||
|
||
time_range_constraint: Optional[TimeRangeConstraint] = dataclasses.field(metadata={_PREDICATE_METADATA_KEY: True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last was pretty hard to understand without diving into the dataclasses docs a bit & running the code to see what the value actually is. Might be helpful to leave a comment to explain that this is not a typical default arg (i.e., you still need to pass a value in for time_range_constraint
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, to be honest I can probably get rid of this and just keep a list of predicate key names. That's what I had originally and I think it's a little less magical overall, but it's still a lot of magic string handling.
if self.pushdown_state is PredicatePushdownState.FULLY_ENABLED: | ||
return | ||
|
||
invalid_predicate_property_names = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this essentially just checking that time_range_constraint
is always None
when pushdown_state
is DISABLED
?
If so, this seems pretty complicated & hard to read for just that functionality, and it might be helpful to simplify by just checking that one scenario. But maybe I'm missing something or this gets used more generically down the stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I definitely should've explained that in the commit message. There will be other parameters added which will also need to be nulled out for disabled. Will update things accordingly.
I'm essentially trying to enforce that when the state is disabled there's no possibility of somebody doing something like if pushdown_params.time_range_constraint or pushdown_params.where_filter_specs
and then triggering a pushdown operation. If the disabled state was binary I could just make it a derived property but because of our conversion metric and time window handling I have to deal with things like time range only and categorical only pushdown states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
) | ||
|
||
@property | ||
def is_pushdown_enabled(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be helpful to call this is_pushdown_enabled_for_time_range
or something more specific so that other devs don't mistake this as an indication that pushdown is FULLY enabled!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't exactly what this does, but to your point inverting it to is_pushdown_disabled might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a ton of updates and now this is EXACTLY what that does. I also put in the disabled thing but I'll probably remove it. I think I should stop tinkering here and move ahead, though.
bfb75ba
to
55ed28e
Compare
9435008
to
8d77d33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @courtneyholcomb ! I'm still playing around with the expanded pushdown mechanics so I'll come back to this before merge and either put up another PR or revise this one and request a follow up, depending on what I want to do.
time_range_constraint: Optional[TimeRangeConstraint] | ||
_PREDICATE_METADATA_KEY = "is_predicate_property" | ||
|
||
time_range_constraint: Optional[TimeRangeConstraint] = dataclasses.field(metadata={_PREDICATE_METADATA_KEY: True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, to be honest I can probably get rid of this and just keep a list of predicate key names. That's what I had originally and I think it's a little less magical overall, but it's still a lot of magic string handling.
if self.pushdown_state is PredicatePushdownState.FULLY_ENABLED: | ||
return | ||
|
||
invalid_predicate_property_names = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I definitely should've explained that in the commit message. There will be other parameters added which will also need to be nulled out for disabled. Will update things accordingly.
I'm essentially trying to enforce that when the state is disabled there's no possibility of somebody doing something like if pushdown_params.time_range_constraint or pushdown_params.where_filter_specs
and then triggering a pushdown operation. If the disabled state was binary I could just make it a derived property but because of our conversion metric and time window handling I have to deal with things like time range only and categorical only pushdown states.
) | ||
|
||
@property | ||
def is_pushdown_enabled(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't exactly what this does, but to your point inverting it to is_pushdown_disabled might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -61,7 +61,7 @@ class MultiHopJoinCandidate: | |||
lineage: MultiHopJoinCandidateLineage | |||
|
|||
|
|||
class PredicatePushdownState(Enum): | |||
class PushdownPredicateInputType(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change!!
55ed28e
to
88723e2
Compare
dc94cfc
to
31ee2df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good!
88723e2
to
ece1e07
Compare
31ee2df
to
3a02ea7
Compare
ece1e07
to
7e723bc
Compare
The predicate pushdown operations for time range constriants currently rely on None/not-None state to determine whether or not to push the ConstrainTimeRange filter node to the source. With the switch to pushdown params this None/not-None state gets tenuous, as future updates will need to do things like manage time range constraints, other time dimension filters, and categorical dimension (and entity) filters, and relying on externalizing None/not-None combinations for these various filter types will be challenging. This change encapsulates the enabling and disabling of time range constraints inside of PredicatePushdownParams. At the moment, in order to maintain the existing behavior, we simply internalize the None/not-None behavior for time range constraints. This will also allow us to easily retain pushdown processing for categorical dimensions even when time constraint filters should not be applied, and gradually centralize those controls as we streamline the callsites. There is added complexity to this change because of two things. 1. Time constraint updating is scattered around in the DataflowPlanBuilder 2. Conversion metrics currently do predicate pushdown for time constraints The first of these will be addressed later. Since we cannot reliably support general predicate pushdown for conversion metrics, we need to allow for a time-range-only pushdown operation. Today this is equivalent to pushdown enabled, but that will change shortly.
3a02ea7
to
9769994
Compare
The predicate pushdown operations for time range constriants currently
rely on None/not-None state to determine whether or not to push the
ConstrainTimeRange filter node to the source. With the switch to
pushdown params this None/not-None state gets tenuous, as future
updates will need to do things like manage time range constraints,
other time dimension filters, and categorical dimension (and entity)
filters, and relying on externalizing None/not-None combinations for
these various filter types will be challenging.
This change encapsulates the enabling and disabling of time range
constraints inside of PredicatePushdownParams. At the moment, in
order to maintain the existing behavior, we simply internalize the
None/not-None behavior for time range constraints. This will also
allow us to easily retain pushdown processing for categorical dimensions
even when time constraint filters should not be applied, and
gradually centralize those controls as we streamline the callsites.
There is added complexity to this change because of two things.
The first of these will be addressed later.
Since we cannot reliably support general predicate pushdown for conversion
metrics, we need to allow for a time-range-only pushdown operation. Today
this is equivalent to pushdown enabled, but that will change shortly.